billboard hot 100
Global Beats, Local Tongue: Studying Code Switching in K-pop Hits on Billboard Charts
Sankaran, Aditya Narayan, Farahbakhsh, Reza, Crespi, Noel
Code switching, particularly between Korean and English, has become a defining feature of modern K-pop, reflecting both aesthetic choices and global market strategies. This paper is a primary investigation into the linguistic strategies employed in K-pop songs that achieve global chart success, with a focus on the role of code-switching and English lyric usage. A dataset of K-pop songs that appeared on the Billboard Hot 100 and Global 200 charts from 2017 to 2025, spanning 14 groups and 8 solo artists, was compiled. Using this dataset, the proportion of English and Korean lyrics, the frequency of code-switching, and other stylistic features were analysed. It was found that English dominates the linguistic landscape of globally charting K-pop songs, with both male and female performers exhibiting high degrees of code-switching and English usage. Statistical tests indicated no significant gender-based differences, although female solo artists tend to favour English more consistently. A classification task was also performed to predict performer gender from lyrics, achieving macro F1 scores up to 0.76 using multilingual embeddings and handcrafted features. Finally, differences between songs charting on the Hot 100 versus the Global 200 were examined, suggesting that, while there is no significant gender difference in English, higher English usage may be more critical for success in the US-focused Hot 100. The findings highlight how linguistic choices in K-pop lyrics are shaped by global market pressures and reveal stylistic patterns that reflect performer identity and chart context.
- North America > United States (0.24)
- Europe > France (0.04)
- Asia (0.04)
- Research Report > New Finding (1.00)
- Research Report > Experimental Study > Negative Result (0.93)
- Media > Music (1.00)
- Leisure & Entertainment (1.00)
Beyond the Hook: Predicting Billboard Hot 100 Chart Inclusion with Machine Learning from Streaming, Audio Signals, and Perceptual Features
The advent of digital streaming platforms have recently revolutionized the landscape of music industry, with the ensuing digitalization providing structured data collections that open new research avenues for investigating popularity dynamics and mainstream success. The present work explored which determinants hold the strongest predictive influence for a track's inclusion in the Billboard Hot 100 charts, including streaming popularity, measurable audio signal attributes, and probabilistic indicators of human listening. The analysis revealed that popularity was by far the most decisive predictor of Billboard Hot 100 inclusion, with considerable contribution from instrumentalness, valence, duration and speechiness. Logistic Regression achieved 90.0% accuracy, with very high recall for charting singles (0.986) but lower recall for non-charting ones (0.813), yielding balanced F1-scores around 0.90. Random Forest slightly improved performance to 90.4% accuracy, maintaining near-perfect precision for non-charting singles (0.990) and high recall for charting ones (0.992), with F1-scores up to 0.91. Gradient Boosting (XGBoost) reached 90.3% accuracy, delivering a more balanced trade-off by improving recall for non-charting singles (0.837) while sustaining high recall for charting ones (0.969), resulting in F1-scores comparable to the other models.
- North America > United States (0.14)
- Europe > Greece > West Greece > Patra (0.04)
- Media > Music (1.00)
- Leisure & Entertainment (1.00)
Reasoning Court: Combining Reasoning, Action, and Judgment for Multi-Hop Reasoning
While large language models (LLMs) have demonstrated strong capabilities in tasks like question answering and fact verification, they continue to suffer from hallucinations and reasoning errors, especially in multi-hop tasks that require integration of multiple information sources. Current methods address these issues through retrieval-based techniques (grounding reasoning in external evidence), reasoning-based approaches (enhancing coherence via improved prompting), or hybrid strategies combining both elements. One prominent hybrid method, ReAct, has outperformed purely retrieval-based or reasoning-based approaches; however, it lacks internal verification of intermediate reasoning steps, allowing potential errors to propagate through complex reasoning tasks. In this paper, we introduce Reasoning Court (RC), a novel framework that extends iterative reasoning-and-retrieval methods, such as ReAct, with a dedicated LLM judge. Unlike ReAct, RC employs this judge to independently evaluate multiple candidate answers and their associated reasoning generated by separate LLM agents. The judge is asked to select the answer that it considers the most factually grounded and logically coherent based on the presented reasoning and evidence, or synthesizes a new answer using available evidence and its pre-trained knowledge if all candidates are inadequate, flawed, or invalid. Evaluations on multi-hop benchmarks (HotpotQA, MuSiQue) and fact-verification (FEVER) demonstrate that RC consistently outperforms state-of-the-art few-shot prompting methods without task-specific fine-tuning.
- Asia > Timor-Leste (0.29)
- North America > United States > Colorado (0.05)
- North America > United States > Indiana > Monroe County > Bloomington (0.05)
- (7 more...)
- Transportation > Air (1.00)
- Media > Television (1.00)
- Media > Film (1.00)
- (3 more...)
Song Hit Prediction: Predicting Billboard Hits Using Spotify Data
In this work, we attempt to solve the Hit Song Science problem, which aims to predict which songs will become chart-topping hits. We constructed a dataset with approximately 1.8 million hit and non-hit songs and extracted their audio features using the Spotify Web API. We test four models on our dataset. Our best model was random forest, which was able to predict Billboard song success with 88% accuracy.
- Media > Music (1.00)
- Leisure & Entertainment (1.00)